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Abstract 

We describe the application of the unbiased se- 
quential analysis algorithm developed by Dee and 
da Silva (1998) to the GEOS DAS moisture anal- 
ysis. The algorithm estimates the persistent com- 
ponent of model error using rawinsonde observa- 
tions and adjusts the first-guess moisture field ac- 
cordingly. Results of two seasonal data assimilation 
cycles show that moisture analysis bias is almost 
completely eliminated in all observed regions. The 
improved analyses cause a sizable reduction in the 
6h-forecast bias and a marginal improvement in the 
error standard deviations. 



1 Introduction 


Hidden beneath the computational complexities of atmospheric data assimila- 
tion systems lies a multitude of assumptions about the errors associated with 
observing and predicting atmospheric fields. Most of these assumptions are 
there for practical reasons, either because there is not enough information to 
remove them, or because they result in critical computational simplifications. 
Some, however, are known to be false and could be relaxed without too much 
difficulty, with a potentially large benefit to analysis accuracy. 

A case in point is the standard assumption that short-term model forecasts, 
which are used as first guess fields for the analyses, are unbiased. There is plenty 
of evidence to the contrary. For example, Figure 1 shows the means and stan- 
dard deviations of the differences between observed and forecast atmospheric 
water vapor mixing ratios, computed from January 1998 rawinsonde station 
data in four separate regions. The 6/i-forecasts were produced by the Goddard 
Earth Observing System Data Assimilation System, Version 2.8 (GEOS DAS 
2.8), which we describe in Section 4. The statistics show that the systematic 
component of the observed-minus-forecast residuals (O-F) is not insignificant, 
especially in the Tropics and throughout the upper troposphere. Assuming that 
the mean observation error is small, this invalidates the assumption that the 
forecast is unbiased. 


[Figure 1 about here.] 


Details of the forecast bias obviously depend on the particulars of the general 
circulation model (GCM) that produced the forecast. They also depend on the 
accuracy of the analysis used to initialize the model, which, in turn, is partly 
determined by the quality and types of observations that entered into the anal- 
ysis. However, the magnitude of GEOS moisture bias is not atypical. Monthly 
statistics of specific humidity O-F’s produced by the operational DAS of the 
European Centre for Medium-Range Weather Forecasts (ECMWF) show com- 
parable biases and standard deviations (F. Lalaurette 1999, pers. comm.). The 
ECMWF DAS operates at a higher spatial resolution than the GEOS DAS, it 
uses a different analysis method, and it assimilates TIROS Operational Vertical 
Sounder (TOVS) as well as rawinsonde moisture data. The main cause of bias 
in the moisture fields produced by the two systems appears to be related to the 
model parameterizations of deep convection and cloud microphysics, which are 
perhaps inadequate in all current-generation GCMs (Chen et al 1998). 

The analysis produced by GEOS DAS 2.8 is a weighted average of the 6/i-forecast 
and the available observations. Analysis weights are derived from assumptions 
about the relative accuracies of these two sources of information, which, espe- 
cially in the case of moisture, are not very well known. Regardless, the analysis 
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inherits a fraction of the forecast bias, simply because of averaging. To illustrate, 
we also show in Figure 1 the means and standard deviations of the observed- 
minus-analysis differences (0-A). Although the amplitude of the bias has been 
reduced by half, its sign is everywhere the same as that of the forecast bias. 
Applying more weight to the observations will reduce the bias but increase the 
random component of analysis error. It is generally not possible to produce an 
unbiased analysis from a biased forecast, unless a reasonable estimate of the 
forecast bias is available. 

The purpose of this paper is to describe the implementation in GEOS DAS of the 
unbiased sequential analysis algorithm developed by Dee and da Silva (1998) 
(DdS). The algorithm estimates forecast bias from observations and corrects 
the first guess accordingly. Since the bias estimate is continuously updated, it 
is more accurately described as an estimate of the slowly varying component 
of forecast error. The present implementation, which we refer to as GEOS 
DAS BC, is limited to the moisture field and based on rawinsonde data only. 
As it turns out, the errors in the 6h-moisture forecasts contain relatively large 
persistent components, which are easily captured by the algorithm. Figure 2 
displays the January 1998 O-F and O-A statistics for GEOS DAS BC, for the 
same regions as before. The bias has all but disappeared, and even the standard 
deviations are reduced, albeit by a very small amount. These results were 
obtained simply by incorporating the forecast bias estimation in the analysis. 
No other modifications were made to the DAS; in particular, the forecast and 
observation error covariance models were left unchanged. 


[Figure 2 about here.] 


Some clarification of terms may be helpful at this point. Bias generally refers 
to a non-zero mean error. In theory the mean and other statistics are defined in 
terms of a hypothetical ensemble of realizations and its associated probability 
density. In practical applications, however, the bias is usually defined as a time 
average of a single realization of the error, taken over a finite time interval. This 
quantity is spatially variable and, if the errors are bounded, it can evolve on a 
time scale that is comparable with the length of the averaging interval. This 
length may vary, but our results are usually stated in terms of monthly means. 
The term systematic error is loosely applied in this paper to any type of error 
caused by an inherent, persistent deficiency in the model or in the observing 
system. In a nonlinear system systematic errors are necessarily state- dependent. 
Bias is a particular manifestation of systematic errors. 

Earlier work addressing systematic forecast errors in the context of data as- 
similation was done at the former National Meteorological Center by Thiebaux 
and Morone (1990) and Saha (1992), and at NASA’s Data Assimilation Office 
by Takacs (1996). In each of these studies, forecast bias estimates were de- 
rived from analyses rather than from observations. Griffith and Nichols (1996) 
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consider the treatment of model error, and the bias problem in particular, by 
means of adjoint methods. DelSoIe and Hou (1999) explore the possibility of 
constructing state-dependent empirical corrections, also derived from analyses, 
in order to account for systematic errors in the forecast model. 

The remainder of this paper explains the details of our GEOS DAS BC imple- 
mentation and experimental results. In Section 2 we take a closer look at time 
series of mixing ratio residuals produced by GEOS DAS 2.8, to better under- 
stand the manifestations of systematic errors in the model forecasts. We briefly 
review the unbiased sequential analysis algorithm in Section 3. There we also 
discuss the specification of a covariance model for the bias estimation errors, 
as well as other general implementation aspects. In an Appendix we derive the 
optimal weights for the unbiased analysis equations under a reasonable set of 
assumptions; this issue was not settled in the original presentation of the algo- 
rithm in DdS. In Section 4 we describe the specifics of the implementation in 
GEOS DAS and present results of two seasonal data assimilation experiments 
with the new method. Section 5 contains our conclusions and plans for future 
work. 


2 Evidence of systematic model errors 


We can detect systematic model errors by comparing forecasts with observa- 
tions. Non-zero mean residuals, computed over suitably long time periods and 
over a reasonably large set of stations, can be attributed to systematic model 
errors, provided that the mean observation errors are small. This will be the 
case if systematic errors in the data have been effectively removed. It is stan- 
dard practice, for example, to correct rawinsonde height data at high altitudes 
for the effects of solar and infrared radiation (Mitchell et al. 1996). Recent work 
by Zipser and Johnson (1998) shows that humidity soundings may also be con- 
taminated by instrument-dependent systematic errors, but there is currently no 
practical way to correct these data in real time. Nevertheless, quality-controlled 
rawinsonde observations continue to serve as a benchmark for all other estimates 
of atmospheric moisture content. 

The mixing ratio residual statistics shown in Figure 1 represent averages over a 
month of station data from four specific regions. The monthly means and stan- 
dard deviations are similar during other periods and for other station selections, 
but they are usually largest in the Tropics and smaller (but seasonably depen- 
dent) in the Extratropics. This reflects a strong dependence of moisture forecast 
errors, and probably of observation errors as well, on the amount of moisture 
present in the atmosphere and on its local variability. To see this more clearly, 
it is helpful to look at individual time series of station data and corresponding 
6/i-forecasts. 
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[Figure 3 about here.] 


We plotted in Figure 3 the January 1998 mixing ratio observations, forecasts, 
and residuals, for three pressure levels at Singapore. The lower panel shows 
that the 850hPa forecasts are consistently drier than the observations. Most of 
the dots (observations) are above the thin curve (forecasts), and accordingly the 
residuals (thick curve) tend to be positive. At higher levels the situation is not 
as obvious. The forecasts at 300hPa are too wet during several periods lasting 
a few days or more. This is consistent with the regional mean wet bias in the 
forecast model at this level, but there are also periods with several consecutive 
dry forecasts. 

We have looked at many such plots for different time periods and stations in var- 
ious locations. In contrast with the monthly mean statistics, which are primarily 
a function of region, altitude, and season, the time-dependent characteristics of 
systematic forecast errors are not so easily quantified. They are best described 
as having a tendency to persist for a while: successive 6/i-forecasts often re- 
main either too wet or too dry for a few days or more. Although surely there 
are underlying physical explanations, the onset of such spells seems to occur 
randomly. 

These appear to be manifestations of serially correlated model errors. In a the- 
oretical study, Daley (1992a) used a Kalman filter on a simple one-dimensional 
linear quasi-geostrophic model to examine the potential impact of such errors 
on analysis accuracy. He also pointed out the practical difficulty of distinguish- 
ing between model bias and serially correlated model errors, even though the 
two are conceptually quite different. In our analysis of water vapor mixing ra- 
tio observed-minus-forecast residuals we have seen evidence of both types of 
phenomena. 

Serial correlation of the residuals shows up clearly in the time spectra. We 
plotted normalized power spectra, as a function of wave period, for Tropical, 
Northern-Hemispheric, and Southern-Hemispheric rawinsonde stations in Fig- 
ure 4. There is an excess of power in waves with periods longer than about 
5 days, in each of the regions and at all levels; note that serially uncorrelated 
errors would result in flat spectra. These average spectra were obtained by 
(i) scaling the residuals at each station and at each level, so that the means 
are zero and the standard deviations one; (ii) computing the spectrum of each 
scaled time series; and (iii) averaging the spectra for all time series consisting 
of at least 50 residuals. Since there are gaps in the data, we used a spectral 
analysis algorithm for unevenly spaced data (Press et ai 1992, Section 13.8). 


[Figure 4 about here.] 



We will show in the next section how to use this type of spectral analysis for 
calibrating the parameters in the bias correction algorithm. 


3 On-line bias correction 


Standard analysis methods are bias-blind , in the sense that they ignore biases in 
the first guess field. If the first guess is actually biased, then so will the analysis 
be biased. An unbiased analysis method must therefore include a scheme for 
estimating and removing the first-guess bias. 


3.1 The bias-blind analysis equation 

In a sequential statistical data assimilation system such as GEOS DAS, analy- 
ses are produced at regular (typically 6-hour) intervals. Each analysis combines 
quality-controlled observations with a forecast issued from an initial state de- 
rived from the previous analysis. Symbolically, for k = 1, 2, . . . , 

= w* +K* [w£ -H*w{] , (1) 

where the n- vectors w£, w£ are the analysis and forecast at time £*, respectively, 
and the p- vector w£ contains the observations. The pxn-matrix H* is the 
observation operator, which maps model variables to observables, and K* is an 
nxp-matrix of analysis weights. 

If both forecast and observations are unbiased, and their errors are mutually 
independent, then the optimal (least-squares) analysis obtains for 

K* = P{Hf [h*P'H? + R*]"\ (2) 

with P{ and Ft* the forecast and observation error covariances, respectively 
(Jazwinski 1970). 

The development of adequate covariance models is an area of active research. In 
(linear) theory, p£ and Ft* can be computed explicitly using knowledge of the 
joint probability distribution of model and observation errors. In reality there 
is not nearly enough information available to warrant the staggering expense 
that such a computation would entail. Furthermore, the value of a brute-force 
approach is questionable, since many of the assumptions that render (2) opti- 
mal are incorrect in any case (Dee 1991; Dee 1995). Thus, operational data 
assimilation systems use highly simplified representations of the required er- 
ror covariances. These are usually obtained by a combination of statistical data 
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analysis, careful consideration of multivariate balance requirements, optimal use 
of computational resources, and artful tuning of covariance parameters. 

In this paper we largely ignore these issues, because (1) implies that the analysis 
w£ will be biased if the forecast wjf is biased, regardless of the analysis weights 
K*. See DdS (Section 2) for more discussion of the transfer of forecast bias to 
the analysis, and of the effect of adjusting the covariance models. 


3.2 The unbiased analysis equations 

DdS showed how to produce unbiased analyses in a sequential data assimilation 
system when the forecast is biased. The idea is to provide a running estimate of 
the bias and to correct the forecast accordingly. The result is the replacement 
of (1) by the following two-step algorithm: 

b* = b*_i - L* - b*_x)j , (3) 

< = (w{ - £*) + K* [w° k - H t (w{ - b k )] . (4) 

The n- vector 6* is the estimated forecast bias at time <*. The nxp-matrix 
Lit, which we will specify below, defines the weighting coefficients for the bias 
update equation. Note that (4) and (1) are identical when b* = 0. 

The two requirements for the analysis w£ to be unbiased are that (i) the obser- 
vations w£ are unbiased, and (ii) b*-i is an unbiased prediction of the forecast 
bias at tk- This statement follows simply from linearity and holds regardless 
of the specification of the gain matrices L* and K The second condition 
is reasonable when forecast errors tend to persist at fixed locations. It is not 
reasonable, for example, when the errors are dominated by large, systematic dis- 
placements occurring on synoptic time scales. If a better (e.g., state-dependent) 
prediction of forecast bias is available, then it can replace b*-i in (3). 

Whether b*_i provides an unbiased estimate of the forecast bias at f* also de- 
pends on the data coverage, both in space and time. Clearly, if a particular 
region remains unobserved for a while, then it is not possible to obtain a mean- 
ingful bias estimate there, unless additional information (for example, about 
the spatial structure of the forecast bias) is used for the bias prediction. As it 
stands, the bias estimate generated by (3) will remain constant in regions devoid 
of observations. It is therefore important to control the impact of an occasional 
observation in a poorly observed region. This can be done by relaxing the bias 
estimate to its initial state in the absence of observations, or by careful data 
selection. With some abuse of notation, therefore, the observations w£ (and 
associated H fc ) used in (3) and (4) may be different. 





Lacking any a priori information about forecast bias, we take 

b 0 = 0. (5) 

Consequently the bias estimate will remain zero in unobserved regions, and the 
analysis produced by (4) will differ from the bias-blind analysis (1) only where 
data exist. If clearly discernible permanent spatial structures show up in the 
bias estimates, and there is reason to believe they can be extrapolated, then (5) 
should be modified accordingly. 

We show in Appendix A that, within reasonable approximation, the optimal 
weights for the bias estimator (3) are 

U = P b k n T k [H*P*Hj + U k P{Hl + R k ] , (6) 

where P£ is the error covariance of the bias estimate b*_i. Since the observa- 
tions used for the bias estimation are used again in the analysis equation (4), 
the best choice of analysis weights K* is not obvious. However, in the Appendix 
we derive the remarkable result that, if L* is defined by (6), then the optimal 
analysis weights for (4) are still defined by (2). 

The cost of solving the unbiased analysis equations is roughly double that of 
computing the standard, bias-blind analysis. Any analysis system designed 
to solve (1) can be used to solve both (3) and (4), simply by changing the 
background field and the analysis weights. Possible ways to economize are (i) 
to use only a subset of the observations for the bias estimation; (ii) to estimate 
the forecast bias at a reduced spatial resolution; (iii) to update the forecast bias 
estimates less frequently. 


3.3 Specification of the error covariances 

The bias estimator requires specification of the error covariances P£ of the 
bias estimates, in addition to the forecast and observation error covariances 
P{ and Rjt that are needed for the analysis. We are primarily interested in 
incorporating the unbiased analysis equations into an existing operational data 
assimilation system. Initially, therefore, we regard the forecast and observation 
error covariances as given, and use a very simple model for the bias estimation 
error covariance. Our approach is to first concentrate on reducing the mean 
analysis errors; once that has been achieved we can hope to further improve the 
analyses by introducing better covariance models. 

DdS proposed the following model for the error covariances of the bias estimates: 

P£=7P l, (7) 
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with 7 constant. This model assumes that the spatial correlations of the bias 
estimation errors are identical to those of the random component of the forecast 
errors, and (in the multivariate case) that the two types of errors are balanced in 
the same way. This is an attractive starting point since it takes full advantage of 
the effort invested in formulating and implementing a forecast error covariance 
model that produces reasonable results in an operational setting. There are 
obvious ways to generalize, for example, by allowing 7 to depend on space 
and/or time, or by adjusting the correlation models incorporated in P{. 

To arrive at a means for determining an appropriate value for the parameter 
7, we study the behavior of the bias estimator at a single observation location 
that coincides with a model grid point. The bias gain (6) is then 

Ljt = A = al/{a 2 b + o) + 0> (8) 

with crfc , cr / , and o Q the error standard deviations for the bias estimate, the 
forecast, and the observation, respectively, which we take to be stationary for 
the moment. Equation (3) becomes 

i>k - bk - 1 - A[u;Jfc - (u>{ - b*_i)] 
k-l 

= -A 53(1 -A yvk-j, (9) 

j = 0 

with Vk = and we used bo = 0. 

In case of a constant forecast bias b* = b it is easy to show from (9) and the 
assumption that observations are unbiased that lim*_>oo(&Jfc) = b when 0<7< 
2. Therefore the mean bias estimate over any sufficiently long time interval 
will converge to the mean forecast error over that interval. More generally, 
the asymptotic first-moment properties of an estimator for a linear, stationary 
system are not sensitive to the covariance specifications, as long as the system 
is completely observable and controllable (Jazwinski 1970, Section 7.6). Correct 
specification of the error covariances only improves the rate of convergence to 
the asymptotic estimate. This is consistent with our earlier statement that 
the analysis equations (3-4) are unbiased regardless of the error covariance 
specifications. The covariances, or the value of 7 in this scalar case, determine 
the response of the estimator to errors at shorter time scales. 

We showed in Section 2 that, in the absence of forecast bias correction, the time 
series of observed-minus-forecast residuals typically have colored spectra. We 
would like to determine a value for the parameter A such that the spectra of 
the bias-corrected observed-minus-forecast residuals become as flat as possible, 
in some well-defined sense. In the time-frequency domain, (9) corresponds to 

0n = Rn( AK, (10) 
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where 3 n , i/ n are the Fourier coefficients for wavenumber n>0 of the time series 
bk , Vk , respectively. The response function i? n is 

fln(A) = -A/{1 - (1 - X)e~ 2 ^% (11) 

with At the time interval between observations. A flat spectrum of Vk + 6* 
corresponds to 

Wn + 0„| = |[1 + i?n(A)]l/ n | = COUSt. (12) 

Clearly (12) cannot be satisfied exactly by manipulating the single free param- 
eter A. Instead we can use A to reduce the energy in the long- wave portion 
of the spectrum of observed-minus-forecast residuals. A practical method for 
estimating A is to compute the average normalized power spectrum P n of the 
residuals for a set of stations (see Section 2), and then to find A that minimizes 
the functional 

/(A) = J> 2 { |[l + i? n (A)]P„|-l} 2 . (13) 

n 

The factor n 2 serves to emphasize the impact on the long-wave portion of the 
spectrum, and we prefer to use average spectra in order to increase the sample 
size. A value of A can thus be computed, say, separately for data at fixed 
pressure levels from stations in selected regions. 


With <r j and o Q given, (7-8) imply 


7 = 


A a) + ul 
1 - A a) 1 


(14) 


which, in conjunction with (7), completes the specification of the bias estimation 
error covariance model. This is sufficient for our present purposes. 


We briefly outline what would be the next step, namely the re-estimation of 
forecast error standard deviations and other covariance parameters, which, after 
all, are likely to change as a result of the introduction of forecast bias correction. 
The bias-corrected observed-minus-forecast residuals are 

V* = w£ -H*(w{ -b fc _i). (15) 

Using (A.2,A.1,A.4,A.7) of the Appendix, 

Vfc = e* - H*(e{ - e*)» ( 16 ) 

where e%, e b k are the errors in the observations, in the bias-corrected forecast, 
and in the bias estimate, respectively. The covariances of the residuals are 
therefore 


<v*vjf> = K k P{Hj + H*P£H l + R*. 


(17) 
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where we used (A.3,A.5,A.6). This relation between the data and the covariance 
models provides the basis for estimating parameters of Pj£, P£, and R* by, for 
example, maximum-likelihood techniques (Dee 1995; Dee and da Silva 1999). 
Parameters of forecast and bias estimation error covariances are probably not 
separately identifiable, so that a model such as (7) will still be needed to close 
the problem. 


4 Implementation in GEOS DAS 


As a first test of on-line forecast bias estimation and correction in an opera- 
tional data assimilation system, we modified the GEOS DAS 2.8 moisture anal- 
ysis and computed data assimilation cycles for two seasons (December 1997- 
February 1998 and June-August 1998). Here we briefly describe the relevant 
characteristics of the system, and then discuss the moisture bias correction ex- 
periments. In order to save space we show results for January 1998 only. Results 
for other months are qualitatively similar, and lead to identical conclusions. 


4.1 Description of GEOS DAS 2.8 

GEOS DAS 2.8 produces global atmospheric data sets at 6-hourly intervals (3- 
hourly for surface fields) on a 2°x2.5° latitude-longitude grid, at 48 vertical 
levels in both pressure and sigma coordinates. The core of the system consists 
of an atmospheric GCM (Takacs et al 1999), the MOSAIC land-surface model 
(Koster and Suarez 1992; Molod and Salmun 1999), the Physical-Space Statis- 
tical Analysis System (PSAS) (Cohn et al 1998) and various interface functions 
including, for example, quality control of observations. The final, assimilated 
data products are obtained from the analyzed fields by means of the incremental 
analysis update (IAU) procedure (Bloom et al 1996). 

Apart from conventional atmospheric observations, the system accepts geopo- 
tential heights retrieved from TIROS operational vertical sounder (TOVS) data, 
cloud-drift wind retrievals, and surface winds obtained from SSM/I wind speed 
data. The only observations of atmospheric moisture entering the system are 
obtained from rawinsonde soundings, although efforts are currently underway to 
implement the assimilation of interactive TOVS moisture retrievals (Joiner and 
Rokke 1999) and SSM/I-derived total precipitable water obtained from NASA's 
Tropical Rainfall Measuring Mission (TRMM) (Hou et al 1999). 

At 6-hourly intervals during the assimilation, a global analysis is computed in 
three steps: first for the moisture field (water vapor mixing ratio), then for the 
upper-air variables (geopotential heights and winds), and finally for the surface 
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variables (sea-level pressure and lOm-winds). In each case PSAS solves the 
analysis equation (1), combining all available observations taken within 3 hours 
of the analysis time with the first-guess fields produced by the GCM. Analysis 
weights are defined by (2), based on prescribed forecast and observation error 
covariance models. The analysis corrections to the first-guess fields are then 
used in the IAU procedure to force the next 6-hour model integration. 

The mixing ratio forecast and observation error variances are such that their 
ratio is a function of pressure only, ranging between a minimum value of 0.76 
(at TOOhPa) and a maximum of 2.8 (at 300hPa). This ratio represents, in a 
scalar analysis, the weight of a single mixing ratio observation relative to that 
of the first-guess value at that location. To complete the covariance model spec- 
ifications, observation errors are assumed uncorrelated in space and time, while 
forecast error correlations are represented by a separable function of horizon- 
tal distance and pressure. The variances and correlation parameters have been 
estimated from observed-minus-forecast residuals by maximum-likelihood tech- 
niques (Dee et ai 1999). Clearly, these exceedingly simple models leave ample 
room for improvement. 


4.2 Description of GEOS DAS BC 

The experimental system GEOS DAS BC is identical to GEOS DAS 2.8 in all 
respects, except that the two-step algorithm (3,4) replaces the moisture analysis 
equation (1). Both steps are solved with PSAS, using the same forecast and 
observation error covariance specifications. For the bias estimation step (3), 
we only use observations in the vicinity of stations that report at least once a 
day. The computational cost of the moisture analysis is then roughly twice that 
in the original system. Since the number of moisture observations is relatively 
small (about 7000 per day), the additional expense is insignificant in the context 
of the total DAS computation. 

The error covariances for the bias estimates, needed to define the weights (6), 
are modeled by (7). We tuned the parameter 7 using the spectral estimation 
procedure described in Section 3, separately for each pressure level, from time 
series of GEOS DAS 2.8 observed-minus-forecast residuals restricted to different 
time periods and regions in space. The estimated values varied somewhat, 
tending to be largest at the upper levels, where the serial correlation of forecast 
errors is most pronounced. We ended up using a constant value 7 = 0.22 for 
our experiments, feeling that further refinement may not be worthwhile until 
improved covariance models can be implemented. 


12 



4.3 January 1998 results 


We first examine the time spectra of the GEOS DAS BC observed-minus- 
forecast residuals in order to validate our choice of the parameter 7. As discussed 
in Section 3, the sequential bias estimator acts as a first-order linear filter on the 
data residuals. Its behavior at a single station location can be characterized by 
a response function in the time-frequency domain, and (with forecast and ob- 
servation error variances given) is determined by the parameter 7. An optimal 
analysis scheme would produce white (serially uncorrelated) observed-minus- 
forecast residuals (Daley 1992b). Figure 5 shows that the regionally averaged 
spectra of the GEOS DAS BC residuals are in fact much flatter than those for 
GEOS DAS 2.8 (shown in Figure 4). 


[Figure 5 about here.] 


In an earlier experiment with 7 = 0.5 we found that the spectra had too little 
power in the low wave numbers (that is, the opposite of Figure 4). The removal 
of forecast and analysis bias was equally effective in that experiment (in the 
monthly-mean sense), but the random component of forecast error was slightly 
larger. This is consistent with the scalar theory presented in Section 3, and sup- 
ports our contention that the analyses are unbiased regardless of the covariance 
specifications. However, increasing the value of 7 has the effect of contaminat- 
ing the bias estimates with noise, and this deteriorates the analyses even if it 
does not change their time-mean properties. We find our procedure for esti- 
mating the parameter 7 based on a spectral analysis of observed-minus-forecast 
residuals to be quite effective. 


[Figure 6 about here.] 


We now take a look at the impact of the forecast bias correction at a single 
station. Figure 6 shows time series data at 300hPa, 500hPa, and 850hPa for 
the Singapore rawinsonde station; this should be compared with Figure 3. The 
thick solid curves are the corrected observed-minus-forecast residuals w£ - (w£ — 
bfc_i). The dashed curves show the forecast bias predictions b*_i; they indi- 
cate that the 6/1 forecasts tend to be too wet at the upper levels and too dry 
below. The bias estimates vary slowly with time; adjustments are made when- 
ever there are persistent differences between the bias-corrected forecasts and the 
observations. The bias estimates may remain quite small for extended periods, 
for example at 850hPa between January 14-25. 


[Figure 7 about here.] 
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Residual time series at a single station fluctuate wildly because of random (or 
small-scale) observation and forecast errors. It is easier to see the impact of the 
bias correction by averaging over several stations. For the Indonesian region, 
for example, we show in Figure 7 the time evolution of the mean observed- 
minus-forecast residuals, for GEOS DAS 2.8 (thin curves) and for the modified 
system (thick curves). A point on each curve is obtained simply by averaging 
the residuals at all stations in the region. Since the bias-corrected forecasts 
(w{ -b*_i) are truly forecasts, in the sense that they do not depend on data 
at these plots demonstrate that the algorithm is able to predict the forecast 
bias relative to future observations. 


[Figure 8 about here.] 


Figure 2, briefly discussed in the Introduction, shows the monthly means and 
standard deviations of both the observed-minus-forecast and observed-minus- 
analysis residuals, for four different regions. Again, these statistics apply to the 
bias-corrected 6h - forecasts (w£ they show that, by being able to predict 

the forecast bias, the algorithm succeeds in producing unbiased analyses. An 
interesting question is whether the improved analyses have a positive impact on 
the uncorrected forecasts w £ as well. This is by no means self-evident, since 
the mechanisms by which model errors are generated are not well-understood, 
although almost certainly nonlinear. Figure 8 shows that, in most cases, the 
mean errors in the uncorrected forecasts produced from the unbiased analyses in 
GEOS DAS BC are in fact much smaller than the errors in the GEOS DAS 2.8 
forecasts. This proves that the improvements in the unbiased GEOS DAS BC 
analyses do in fact result in significantly better (in the mean sense) short-term 
forecasts, at least in observed regions. 

In Figure 9 we show an example of a moisture forecast, the forecast bias estimate, 
and the analysis produced by the algorithm. This is a snapshot for a particular 
region (Indonesia), level (300hPa), and synoptic time (OZ January 1 1998). 
The bias estimate represents a spatial and time average of recent observed- 
minus-forecast residuals at the stations shown. The estimate indicates that the 
forecasts at that level have been consistently wet lately, according to rawinsonde 
soundings. The analysis further adjusts the bias-corrected forecast based on the 
current observations. 


[Figure 9 about here.] 


The spatial structure of the forecast bias estimate shown in Figure 9 clearly 
reflects the distribution of station locations. There is simply no information in 
our scheme about forecast bias other than that contained in the observations. 
Far enough away from station locations the bias estimates remain at their initial 
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values, which are zero in our experiments. This limits the total impact of the 
bias correction on the global moisture analysis. Furthermore, the final GEOS 
DAS data products are based on the output of the IAU procedure, which consists 
of an integration of the GCM forced by the analysis increments. It is conceivable 
that the effect of the bias correction on the assimilation, even in the vicinity of 
data locations, is modified by the IAU due to the influence of the GCM. 

To illustrate these points more clearly, the upper left panel of Figure 10 shows 
the January 1998 total precipitable water (TPW) according to the GEOS DAS 
2.8 assimilation. The upper right panel shows the relative error in this assim- 
ilated data set with respect to TPW derived from SSM/I data (Wentz 1997). 
These estimates are available over the oceans only, and since we use no mois- 
ture data other than those obtained from rawinsonde soundings, the errors are 
quite large. This plot gives an indication of the degree of uncertainty in the 
GEOS DAS moisture estimates. The lower left panel shows the relative differ- 
ence between the GEOS DAS 2.8 assimilated TPW and the TPW computed by 
vertically integrating the GEOS DAS 2.8 BC moisture analyses. We regard this 
as a measure of the total impact of the forecast bias correction on the moisture 
analyses. The impact is not very large (compared to the uncertainty implied 
by the Wentz data) and mostly restricted to land. The lower right panel shows 
the relative impact of the bias correction on TPW in the assimilation, which 
appears to be slightly damped by the IAU procedure. 

[Figure 10 about here.] 


5 Conclusions 


The object of an analysis is to make the best possible use of the available ob- 
servations, given a model forecast and whatever is known about the errors. The 
observations indicate that model errors tend to persist. This means that the 
data contain useful information about likely errors in subsequent forecasts. A 
human forecaster would take advantage of this information, being more natu- 
rally inclined to think in terms of systematic rather than random model errors. 
Of course, the representation of model errors by zero-mean w'hite stochastic 
processes in an automated analysis algorithm is strictly a mathematical device, 
inspired by computational convenience rather than empirical knowledge. 

The unbiased analysis method, introduced in DdS and tested here in the context 
of the GEOS DAS moisture analysis, includes a simple scheme for estimating 
forecast bias. Analyses are produced as usual, but only after removing the bias 
from the forecast. The method relies completely on observations for estimating 
the bias, although it can be easily generalized to incorporate any additional 
information about forecast bias that may be available. As explained in DdS, 
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forecast bias estimation is a data assimilation problem in its own right; it re- 
quires unbiased observations (in this case, rawinsonde soundings) supplemented 
by a model for the bias (in this case, persistence). 

This view clearly exposes the main limitations of the method. First, exclusive 
reliance on observations can be dangerous, since it presumes that effective qual- 
ity control and bias correction algorithms for the observations are in place. In 
practice, forecast bias estimates at individual stations must be carefully moni- 
tored in order to detect problems with the observations themselves. It is also 
important to control the impact of an occasional observation in a poorly ob- 
served region. Second, in the absence of additional information about forecast 
bias, the estimates are only meaningful when and where observations are regu- 
larly available. This limits the impact of the bias correction in a global analysis 
system, as long as rawinsonde soundings only are used to estimate the bias. 
Ultimately, of course, observations that are deemed sufficiently accurate for an 
analysis should be useful for bias estimation as well. In the moisture case, for 
example, we intend to investigate the use of TPW retrievals to obtain nearly 
global estimates of tropospheric moisture bias. 

Our experiments show that the method is extremely effective in predicting mois- 
ture forecast bias based on recent observed- minus-forecast residuals. It is there- 
fore able to produce unbiased moisture analyses at all levels in the observed 
regions. This leads to improved initial conditions for the forecast model and, 
as a result, a reduction in forecast errors. Specifically, we showed that the 6h- 
forecast bias is significantly reduced in many cases. Having eliminated analysis 
bias allows one to identify mean forecast errors with mean model errors, and 
so the bias estimates produced by the method are in fact estimates of mean 
model error. These estimates could prove very useful in quantitative studies of 
model error, which are important both for model development and for advances 
in data assimilation. 

The upshot of our method is that it uses observations more effectively, by remov- 
ing the assumption that the forecast model is unbiased. It might be argued that 
this approach offers only a temporary solution taylored to a poorly performing 
model. Surely, errors will get smaller as models and data improve. However, 
requirements for analysis and forecast accuracy can be expected to increase ac- 
cordingly. The need for analysis methods that efficiently extract small-scale 
information from observations will grow. Such methods must involve adequate 
descriptions of model errors arising from the treatment of topography, convec- 
tive parameterizations, etc. It is not at all obvious that the most effective 
approximation will represent model errors by zero-mean white noise. In fact, 
we believe that the treatment of systematic discrepancies between model and 
data will be even more important in future high-performance data assimilation 
systems. 

We plan to extend this work to the complete GEOS DAS analysis system. There 
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is evidence of significant biases and serially correlated model errors in data 
residuals for all atmospheric variables. The main problems to be addressed in 
this context are the data selection from nonstationary observers, and a reduction 
of computational expense. \Ve also hope to be able to produce forecast bias 
estimates for different lead times, and examine whether, say, 24h forecast skill 
can be improved by making use of these estimates in real time. Finally, we 
would like to explore ways to use the forecast bias estimates for identifying 
specific sources of model error, and thereby help improve prediction models. 
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A Optimal gains 


In order to make claims about consistency and optimality of an estimator we 
need to be precise about what we wish to estimate and about the errors in 
our sources of information. The unknowns at time £* are w£, the true state 
of the atmosphere, and b*, the deterministic component of forecast error. Our 
assumptions on forecast and observations are 

w{=w'+b*+e{, <«/) = 0, (d(d) T ) = P£ (A.1) 

wg = H*wi + «2, (e° k )=0, (e° k ( £ ° k ) T ) = R k (A.2) 

(e° k (e{) T )=0 (A.3) 

and that both noise sequences are white. With b* = 0 these are the 

standard assumptions for the derivation of the Kalman filter (Jazwinski 1970, 
Section 7.3). Bias parameters were first introduced by Friedland (1969), who 
derived a sequential bias estimator which is closely related to our unbiased 
analysis equations. See DdS for a detailed discussion of the relationship between 
Friedland’s algorithm and ours. 

We further assume that a prediction b£ of the bias b* is available such that 

b p k = b k + el (4) = 0, (4(4) T ) = P* (A.4) 

{e° k (e b k ) T )=0 (A.5) 

(d(4) T ) = 0 (A.6) 

Assumption (A.5) is analogous to (A.3); it is clearly true for white observation 
errors when b£ is a prediction in the true sense of the word, that is, an estimate 
that does not depend on future data. The cross-covariance assumption (A.6) is 
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more troublesome, however, and should be regarded as an approximation. In 
particular, for our unbiased analysis equations (3-4) the bias prediction is 

b£ = b fc _!. (A. 7) 

The forecast and the bias prediction b*_i both depend on data at £*-i- 
Clearly, therefore, their respective errors are not independent. On the other 
hand, the forecast errors involve predictability and model errors as well, so 
we expect that the neglected cross-covariances are relatively small. 


Substitution of (A. 1, A. 2, A. 4, A. 7) into (3) gives 


b* = b, + [I - L*H*] *1-1* 



(A.8) 


which shows that b * is an unbiased estimate of b* for any L*. It is also straight- 
forward to show, using the second-moment assumptions in (A.1-A.6), that the 
best (minimum-variance) linear unbiased estimate obtains when L* is given 
by (6). Optimality depends on the correct specifications of all error covari- 
ances; in practice, of course, this will not be the case. In particular, as we 
already pointed out, (A. 6) is an approximation. 


It is not at all obvious, however, that the analysis weights K* defined by (2) 
are optimal. After all, the observations w£ are used twice: first, for estimating 
the bias, and second, for computing the analysis. We can write the analysis 
equation (4) as 

Wjfc = - bfc, (A. 9) 

w fc = + K * [ w * - H*w£] . (A. 10) 


The unbiased analysis (A. 10) combines two sources of information — a bias- 
corrected forecast and a set of observations -whose errors are not independent. 
In fact, using (A. 9), (A.l), and (A.8), 


= - b* - w£ 

= + b k - b* 

= [l-L k H k ](e{-e b k ) + Uel. 


(A-ll) 


The optimal analysis weights, properly accounting for the cross- covariances, are 
(Jazwinski 1970, Section 7.3, Example 7.5) 


Kjt 


pM - **] [h*P {Hi + R* - H k Xj - XtHf] 


-1 


(A. 12) 
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where 


P*' = (HtH) T ) ■ (A 13) 

X t = {tJ(ef> T )- (All) 

With L fc defined by (6), it turns out that (A. 12) and (2) are identical. The 
remainder of this Appendix proves this statement. 


To simplify notation, we omit the subscript k , and define 
P = P 6 -hP / , 

S = HPH r + R. 

Then (A. 11) and (6) imply that 

P / = [I - LH] P [I - LH] t + LRL r 

= P - P 6 H t S“ 1 HP / - PH T S _1 HP k , 

and that 

X = RL 7 

= RS _1 HP 6 , 


(A. 15) 
(A. 16) 


(A. 17) 


(A.18) 


where we used (A.1-A.6). The first factor of (A. 12) is 

P'H t - X T = PH r - P 6 H T S- 1 HP / H r - PH T S _1 HP k H r - P 6 H T S _1 R 

= PH r - P 6 H t S _ 1 [HP / H r + HP 6 H t + R] - P / H T S" 1 HP 6 H T 
= PH r - P 6 H t - P / H t S- 1 HP i, H t 
= P / H t [I - S _1 HP fc H r ] 

= P / H t S 1 [HP / H t + R] . 

(A. 19) 


The inverse of the second factor is 
HP f H T + R - HX t - XH r 

= HPH t - HP 6 H r S _1 HP / H T - HPH t S _1 HP 6 H t 
+ R - HP 6 H t S _1 R - RS _1 HP 6 H t 
= S - HP 6 H t S - 1 [HP y H T + HP k H r + R] 

- HP / H t S _1 HP 6 H t - RS _1 HP k H r 
= S - HP k H r - [HP^H t + R] S _1 HP*H t 
= S - 2HP k H T + HP k H T S _1 HP 6 H T 
= [I - HP 6 H T S‘] [S - HP 6 H r ] 

= [I - HP 6 H t S“ 1 ] [HP / H t + R] . 

(A. 20) 
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Combining the two factors, 

K = p / h 7 s ~ 1 [hp'h^ + r] [hp / h t + r ] _1 [i-hp^s -1 ] -1 

= P / H r S _1 [I - HP 6 H t S _1 ] -1 
= P / H r [S -HP (, H r ] _1 

= P'H 7 [hp / h t + r ] 1 . 

(A. 21) 
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Figure 1: Means (solid) and standard deviations (dashed) of atmospheric wa- 
ter vapor mixing ratio observed-minus-forecast (top panels) and 
observed-minus-analysis (bottom panels) residuals, as a function of 
pressure, for four different regions. Observations consist of quality- 
controlled rawinsonde reports; 6/i-forecasts and analyses were pro- 
duced by GEOS DAS 2.8. 
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Figure 2: As Figure 1, but for GEOS DAS BC. For reference, the GEOS DAS 
2.8 observed-minus-forecast standard deviations are reproduced in the 
top four panels (dotted curves). 
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Figure 3: Water vapor mixing ratio [g/kg] observed-minus-forecast residuals 
(thick solid curves), rawinsonde observations (dots), and 6/i-forecasts 
(thin solid curves) at 300hPa, SOOhPa, and 850hPa in Singapore, for 
the month January 1998. The scales for the residuals are on the left; 
those for the observed and forecast values on the right. The empty 
dot in the center panel represents an observation that was rejected by 
quality control. 


27 






Figure 4: Average normalized power spectra of water vapor mixing ratio 
observed-minus-forecast residual time series, for January 1998 rawin- 
sonde reports in the Tropics (solid), Northern Hemisphere (dashed), 
and Southern Hemisphere (dotted). Horizontal axis is wave period, 
in days; vertical axis is normalized power. White noise would have a 
flat spectrum with power 1 at all wavelengths. 
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Figure 6: As Figure 3, but for GEOS DAS BC. The dashed curves in each panel 
correspond to the forecast bias estimates at the station location; scales 
for these estimates are indicated on the left vertical axes. 
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Figure 7: Time evolution of mean mixing ratio [g/kg] observed-forecast residuals 
for Indonesian rawinsonde stations, at 300hPa, 500hPa, and 850hPa. 
Forecasts are from GEOS DAS 2.8 (thin curves) and GEOS DAS BC 
(thick curves). 
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Figure 8: Monthly mean mixing ratio [g/kg] observed-minus-forecast residuals 
for rawinsonde stations in four different regions. Forecasts are from 
GEOS DAS 2.8 (thin solid curves), and from GEOS DAS BC before 
(dashed curves) and after (thick solid curves) bias correction. 
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Figure 9: Station locations, 300hPa moisture forecast, bias estimate, and final 
moisture analysis near Indonesia for January 1 1998 at OZ. Contour 
values range from 0 g/kg (lightest) to 0.8 g/kg (darkest) by steps of 
0.1 g/kg. 
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Figure 10: January 1998 mean total precipitable water (in mm) in the GEOS 
DAS 2.8 assimilation (upper left panel); the relative error (in per- 
cent) of the GEOS DAS 2.8 assimilation with respect to the Wentz 
retrievals (upper right panel); the relative impact (in percent) of the 
forecast bias correction on total precipitable water in the GEOS DAS 
2.8 BC moisture analyses (lower left panel); and the relative impact 
(in percent) of the forecast bias correction on total precipitable water 
in the GEOS DAS 2.8 BC moisture assimilation (lower left panel). 


34 






